NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A text-guided protein design framework

https://doi.org/10.1038/s42256-025-01011-z

Liu, Shengchao; Li, Yanjing; Li, Zhuoxinran; Gitter, Anthony; Zhu, Yutao; Lu, Jiarui; Xu, Zhao; Nie, Weili; Ramanathan, Arvind; Xiao, Chaowei; et al (March 2025, Nature Machine Intelligence)

Current AI-assisted protein design utilizes mainly protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in text format describing proteins’ high-level functionalities, yet whether the incorporation of such text data can help in protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multimodal framework that leverages textual descriptions for protein design. ProteinDT consists of three consecutive steps: ProteinCLAP, which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441,000 text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks.
more » « less
Free, publicly-accessible full text available March 27, 2026
Shaping the Water-Harvesting Behavior of Metal–Organic Frameworks Aided by Fine-Tuned GPT Models

https://doi.org/10.1021/jacs.3c12086

Zheng, Zhiling; Alawadhi, Ali H; Chheda, Saumil; Neumann, S Ephraim; Rampal, Nakul; Liu, Shengchao; Nguyen, Ha L; Lin, Yen-hsu; Rong, Zichao; Siepmann, J Ilja; et al (December 2023, Journal of the American Chemical Society)

Full Text Available
Attentive Walk-Aggregating Graph Neural Networks

Demirel, Mehmet F.; Liu, Shengchao; Garg, Siddhant; Shi, Zhenmei; Liang, Yingyu (April 2022, Transactions on machine learning research)

Full Text Available
Bad Global Minima Exist and SGD Can Reach Them

Liu, Shengchao; Papailiopoulos, Dimitris; Achlioptas, Dimitris (January 2020, Advances in neural information processing systems)
null (Ed.)
Several works have aimed to explain why overparameterized neural networks generalize well when trained by Stochastic Gradient Descent (SGD). The consensus explanation that has emerged credits the randomized nature of SGD for the bias of the training process towards low-complexity models and, thus, for implicit regularization. We take a careful look at this explanation in the context of image classification with common deep neural network architectures. We find that if we do not regularize explicitly, then SGD can be easily made to converge to poorly-generalizing, high-complexity models: all it takes is to first train on a random labeling on the data, before switching to properly training with the correct labels. In contrast, we find that in the presence of explicit regularization, pretraining with random labels has no detrimental effect on SGD. We believe that our results give evidence that explicit regularization plays a far more important role in the success of overparameterized neural networks than what has been understood until now. Specifically, by penalizing complicated models independently of their fit to the data, regularization affects training dynamics also far away from optima, making simple models that fit the data well discoverable by local methods, such as SGD.
more » « less
Full Text Available

Search for: All records